Overview

Dataset statistics

Number of variables11
Number of observations922
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory79.4 KiB
Average record size in memory88.1 B

Variable types

Numeric7
Categorical4

Alerts

df_index is highly correlated with Payment Method Granted and 2 other fieldsHigh correlation
Credit Limit Granted is highly correlated with Commercial Risk Cover Protected and 4 other fieldsHigh correlation
Commercial Risk Group Code is highly correlated with Credit Limit Granted and 4 other fieldsHigh correlation
IDCliente is highly correlated with df_index and 2 other fieldsHigh correlation
VENTASCLIENTE is highly correlated with NArticulos and 1 other fieldsHigh correlation
NArticulos is highly correlated with VENTASCLIENTE and 1 other fieldsHigh correlation
NContratos is highly correlated with VENTASCLIENTE and 1 other fieldsHigh correlation
Commercial Risk Cover Protected is highly correlated with Credit Limit Granted and 4 other fieldsHigh correlation
Payment Method Granted is highly correlated with df_index and 6 other fieldsHigh correlation
Payment Terms Granted is highly correlated with df_index and 6 other fieldsHigh correlation
Status Code is highly correlated with Credit Limit Granted and 4 other fieldsHigh correlation
df_index has unique values Unique
IDCliente has unique values Unique
Credit Limit Granted has 492 (53.4%) zeros Zeros
Commercial Risk Group Code has 533 (57.8%) zeros Zeros

Reproduction

Analysis started2022-11-01 18:03:45.133829
Analysis finished2022-11-01 18:03:55.189431
Duration10.06 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct922
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1514.378525
Minimum1
Maximum3891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum1
5-th percentile92.05
Q1439.25
median1204.5
Q32458
95-th percentile3606.65
Maximum3891
Range3890
Interquartile range (IQR)2018.75

Descriptive statistics

Standard deviation1178.149944
Coefficient of variation (CV)0.7779758658
Kurtosis-1.122984452
Mean1514.378525
Median Absolute Deviation (MAD)911
Skewness0.4755752134
Sum1396257
Variance1388037.291
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
20661
 
0.1%
20801
 
0.1%
20811
 
0.1%
20881
 
0.1%
20901
 
0.1%
20961
 
0.1%
21011
 
0.1%
21031
 
0.1%
21061
 
0.1%
Other values (912)912
98.9%
ValueCountFrequency (%)
11
0.1%
31
0.1%
41
0.1%
51
0.1%
81
0.1%
91
0.1%
101
0.1%
121
0.1%
131
0.1%
151
0.1%
ValueCountFrequency (%)
38911
0.1%
38881
0.1%
38861
0.1%
38751
0.1%
38741
0.1%
38591
0.1%
38551
0.1%
38451
0.1%
38331
0.1%
38321
0.1%

Credit Limit Granted
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16419.7397
Minimum0
Maximum45000
Zeros492
Zeros (%)53.4%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q345000
95-th percentile45000
Maximum45000
Range45000
Interquartile range (IQR)45000

Descriptive statistics

Standard deviation20070.84523
Coefficient of variation (CV)1.222360744
Kurtosis-1.505293659
Mean16419.7397
Median Absolute Deviation (MAD)0
Skewness0.5909489081
Sum15139000
Variance402838828.2
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0492
53.4%
45000278
30.2%
1000043
 
4.7%
2000027
 
2.9%
2500018
 
2.0%
1500013
 
1.4%
500011
 
1.2%
80007
 
0.8%
280006
 
0.7%
320005
 
0.5%
Other values (6)22
 
2.4%
ValueCountFrequency (%)
0492
53.4%
500011
 
1.2%
80007
 
0.8%
1000043
 
4.7%
120003
 
0.3%
1500013
 
1.4%
180004
 
0.4%
2000027
 
2.9%
220004
 
0.4%
2500018
 
2.0%
ValueCountFrequency (%)
45000278
30.2%
380003
 
0.3%
350005
 
0.5%
320005
 
0.5%
300003
 
0.3%
280006
 
0.7%
2500018
 
2.0%
220004
 
0.4%
2000027
 
2.9%
180004
 
0.4%

Commercial Risk Cover Protected
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
0.0
492 
95.0
424 
50.0
 
6

Length

Max length4
Median length3
Mean length3.46637744
Min length3

Characters and Unicode

Total characters3196
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row95.0
4th row0.0
5th row95.0

Common Values

ValueCountFrequency (%)
0.0492
53.4%
95.0424
46.0%
50.06
 
0.7%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0.0492
53.4%
95.0424
46.0%
50.06
 
0.7%

Most occurring characters

ValueCountFrequency (%)
01420
44.4%
.922
28.8%
5430
 
13.5%
9424
 
13.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2274
71.2%
Other Punctuation922
28.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01420
62.4%
5430
 
18.9%
9424
 
18.6%
Other Punctuation
ValueCountFrequency (%)
.922
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3196
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01420
44.4%
.922
28.8%
5430
 
13.5%
9424
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3196
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01420
44.4%
.922
28.8%
5430
 
13.5%
9424
 
13.3%

Commercial Risk Group Code
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.622559653
Minimum0
Maximum7
Zeros533
Zeros (%)57.8%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q34
95-th percentile6
Maximum7
Range7
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.167435116
Coefficient of variation (CV)1.335812284
Kurtosis-0.8030320798
Mean1.622559653
Median Absolute Deviation (MAD)0
Skewness0.8815205815
Sum1496
Variance4.697774983
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0533
57.8%
5120
 
13.0%
471
 
7.7%
255
 
6.0%
354
 
5.9%
141
 
4.4%
637
 
4.0%
711
 
1.2%
ValueCountFrequency (%)
0533
57.8%
141
 
4.4%
255
 
6.0%
354
 
5.9%
471
 
7.7%
5120
 
13.0%
637
 
4.0%
711
 
1.2%
ValueCountFrequency (%)
711
 
1.2%
637
 
4.0%
5120
 
13.0%
471
 
7.7%
354
 
5.9%
255
 
6.0%
141
 
4.4%
0533
57.8%

Payment Method Granted
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
0.0
492 
99.0
430 

Length

Max length4
Median length3
Mean length3.46637744
Min length3

Characters and Unicode

Total characters3196
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row99.0
4th row0.0
5th row99.0

Common Values

ValueCountFrequency (%)
0.0492
53.4%
99.0430
46.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0.0492
53.4%
99.0430
46.6%

Most occurring characters

ValueCountFrequency (%)
01414
44.2%
.922
28.8%
9860
26.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2274
71.2%
Other Punctuation922
28.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01414
62.2%
9860
37.8%
Other Punctuation
ValueCountFrequency (%)
.922
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3196
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01414
44.2%
.922
28.8%
9860
26.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3196
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01414
44.2%
.922
28.8%
9860
26.9%

Payment Terms Granted
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
0.0
492 
180.0
430 

Length

Max length5
Median length3
Mean length3.932754881
Min length3

Characters and Unicode

Total characters3626
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row180.0
4th row0.0
5th row180.0

Common Values

ValueCountFrequency (%)
0.0492
53.4%
180.0430
46.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0.0492
53.4%
180.0430
46.6%

Most occurring characters

ValueCountFrequency (%)
01844
50.9%
.922
25.4%
1430
 
11.9%
8430
 
11.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2704
74.6%
Other Punctuation922
 
25.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01844
68.2%
1430
 
15.9%
8430
 
15.9%
Other Punctuation
ValueCountFrequency (%)
.922
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3626
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01844
50.9%
.922
25.4%
1430
 
11.9%
8430
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3626
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01844
50.9%
.922
25.4%
1430
 
11.9%
8430
 
11.9%

Status Code
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
2.0
492 
66.0
425 
8.0
 
5

Length

Max length4
Median length3
Mean length3.460954447
Min length3

Characters and Unicode

Total characters3191
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row66.0
4th row2.0
5th row66.0

Common Values

ValueCountFrequency (%)
2.0492
53.4%
66.0425
46.1%
8.05
 
0.5%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
2.0492
53.4%
66.0425
46.1%
8.05
 
0.5%

Most occurring characters

ValueCountFrequency (%)
.922
28.9%
0922
28.9%
6850
26.6%
2492
15.4%
85
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2269
71.1%
Other Punctuation922
28.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0922
40.6%
6850
37.5%
2492
21.7%
85
 
0.2%
Other Punctuation
ValueCountFrequency (%)
.922
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3191
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.922
28.9%
0922
28.9%
6850
26.6%
2492
15.4%
85
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3191
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.922
28.9%
0922
28.9%
6850
26.6%
2492
15.4%
85
 
0.2%

IDCliente
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct922
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26380795.55
Minimum110007
Maximum63500002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum110007
5-th percentile1340502
Q17087505
median24745004
Q341877503.25
95-th percentile59105502.1
Maximum63500002
Range63389995
Interquartile range (IQR)34789998.25

Descriptive statistics

Standard deviation19359415.68
Coefficient of variation (CV)0.7338450291
Kurtosis-1.225196293
Mean26380795.55
Median Absolute Deviation (MAD)17580000
Skewness0.2746874114
Sum2.432309349 × 1010
Variance3.747869753 × 1014
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1100071
 
0.1%
362300021
 
0.1%
364200031
 
0.1%
364300031
 
0.1%
365100021
 
0.1%
365400021
 
0.1%
366600061
 
0.1%
367100021
 
0.1%
367300041
 
0.1%
367700031
 
0.1%
Other values (912)912
98.9%
ValueCountFrequency (%)
1100071
0.1%
1300301
0.1%
1400101
0.1%
1500031
0.1%
1800031
0.1%
1900021
0.1%
2000041
0.1%
2200051
0.1%
2300041
0.1%
2500081
0.1%
ValueCountFrequency (%)
635000021
0.1%
634500021
0.1%
634200021
0.1%
632400021
0.1%
632200021
0.1%
629600021
0.1%
628600021
0.1%
627000021
0.1%
624800021
0.1%
624700021
0.1%

VENTASCLIENTE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct891
Distinct (%)96.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean729.5120912
Minimum50.25
Maximum4743.43
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum50.25
5-th percentile71.71
Q1205.45
median394.79
Q3927.075625
95-th percentile2489.53485
Maximum4743.43
Range4693.18
Interquartile range (IQR)721.625625

Descriptive statistics

Standard deviation822.9740137
Coefficient of variation (CV)1.128115659
Kurtosis4.494816529
Mean729.5120912
Median Absolute Deviation (MAD)251.6275
Skewness2.083626812
Sum672610.1481
Variance677286.2272
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
225.453
 
0.3%
205.453
 
0.3%
71.713
 
0.3%
90.93
 
0.3%
178.8852
 
0.2%
173.722
 
0.2%
52.122
 
0.2%
59.592
 
0.2%
121.22
 
0.2%
110.092
 
0.2%
Other values (881)898
97.4%
ValueCountFrequency (%)
50.251
0.1%
50.52
0.2%
50.882
0.2%
51.511
0.1%
51.6161
0.1%
52.122
0.2%
52.521
0.1%
53.281
0.1%
53.65121
0.1%
54.2371
0.1%
ValueCountFrequency (%)
4743.431
0.1%
4493.271
0.1%
4363.8531
0.1%
4265.421
0.1%
4242.751
0.1%
4011.9151
0.1%
4008.71241
0.1%
3989.6251
0.1%
3958.8991
0.1%
3900.33451
0.1%

NArticulos
Real number (ℝ≥0)

HIGH CORRELATION

Distinct346
Distinct (%)37.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.38088937
Minimum1
Maximum471.75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum1
5-th percentile4.05
Q111
median29
Q378.78125
95-th percentile254.9375
Maximum471.75
Range470.75
Interquartile range (IQR)67.78125

Descriptive statistics

Standard deviation85.00419076
Coefficient of variation (CV)1.320332658
Kurtosis5.391693684
Mean64.38088937
Median Absolute Deviation (MAD)23
Skewness2.276235304
Sum59359.18
Variance7225.712446
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
576
 
8.2%
625
 
2.7%
423
 
2.5%
322
 
2.4%
1321
 
2.3%
1220
 
2.2%
720
 
2.2%
920
 
2.2%
1017
 
1.8%
816
 
1.7%
Other values (336)662
71.8%
ValueCountFrequency (%)
11
 
0.1%
21
 
0.1%
322
 
2.4%
423
 
2.5%
576
8.2%
625
 
2.7%
720
 
2.2%
816
 
1.7%
920
 
2.2%
1017
 
1.8%
ValueCountFrequency (%)
471.751
0.1%
459.751
0.1%
442.251
0.1%
4391
0.1%
4381
0.1%
4361
0.1%
425.51
0.1%
417.51
0.1%
415.751
0.1%
4051
0.1%

NContratos
Real number (ℝ≥0)

HIGH CORRELATION

Distinct127
Distinct (%)13.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.52711497
Minimum1
Maximum179
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB

Quantile statistics

Minimum1
5-th percentile4
Q17
median15
Q333
95-th percentile98
Maximum179
Range178
Interquartile range (IQR)26

Descriptive statistics

Standard deviation32.30449964
Coefficient of variation (CV)1.173551957
Kurtosis5.234898291
Mean27.52711497
Median Absolute Deviation (MAD)10
Skewness2.256570701
Sum25380
Variance1043.580697
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
587
 
9.4%
676
 
8.2%
1040
 
4.3%
738
 
4.1%
436
 
3.9%
1231
 
3.4%
930
 
3.3%
1628
 
3.0%
1325
 
2.7%
1124
 
2.6%
Other values (117)507
55.0%
ValueCountFrequency (%)
11
 
0.1%
22
 
0.2%
323
 
2.5%
436
3.9%
587
9.4%
676
8.2%
738
4.1%
821
 
2.3%
930
 
3.3%
1040
4.3%
ValueCountFrequency (%)
1791
0.1%
1741
0.1%
1661
0.1%
1651
0.1%
1641
0.1%
1611
0.1%
1602
0.2%
1591
0.1%
1581
0.1%
1531
0.1%

Interactions

Correlations

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexCredit Limit GrantedCommercial Risk Cover ProtectedCommercial Risk Group CodePayment Method GrantedPayment Terms GrantedStatus CodeIDClienteVENTASCLIENTENArticulosNContratos
010.00.00.00.00.02.0110007830.1700121.00121
130.00.00.00.00.02.01300302402.7208373.00165
2445000.095.01.099.0180.066.01400102044.0675202.5094
350.00.00.00.00.02.0150003295.340013.0013
4820000.095.03.099.0180.066.0180003239.840012.0012
5945000.095.03.099.0180.066.0190002388.830024.259
6100.00.00.00.00.02.0200004154.53006.006
71245000.095.02.099.0180.066.0220005339.250039.0024
81345000.095.03.099.0180.066.0230004591.3400108.5041
9150.00.00.00.00.02.0250008983.5400115.0044

Last rows

df_indexCredit Limit GrantedCommercial Risk Cover ProtectedCommercial Risk Group CodePayment Method GrantedPayment Terms GrantedStatus CodeIDClienteVENTASCLIENTENArticulosNContratos
912383245000.095.02.099.0180.066.062470002418.505.05
913383345000.095.06.099.0180.066.062480002151.505.05
914384510000.095.06.099.0180.066.062700002133.2314.012
915385545000.095.05.099.0180.066.062860002411.8861.07
916385945000.095.03.099.0180.066.062960002177.725.05
917387420000.095.05.099.0180.066.063220002225.455.05
918387538000.095.04.099.0180.066.063240002253.749.09
919388645000.095.05.099.0180.066.063420002191.8517.06
920388845000.095.03.099.0180.066.063450002426.3462.010
921389145000.095.05.099.0180.066.063500002246.545.05